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Abstract.  Software  systems  security  represents  a  major  concern  as  cyber¬ 
attacks  continue  to  grow  in  number  and  sophistication.  In  addition  to  the 
increasing  complexity  and  interconnection  of  modern  information  sys¬ 
tems,  these  systems  run  significant  similar  software.  This  is  known  as  IT 
monoculture.  As  a  consequence,  software  systems  share  common  vulner¬ 
abilities,  which  enable  the  spread  of  malware.  The  principle  of  diversity 
can  help  in  mitigating  the  negative  effects  of  IT  monoculture  on  secu¬ 
rity.  One  important  category  of  the  diversity-based  software  approaches 
for  security  purposes  focuses  on  enabling  efficient  and  effective  dynamic 
monitoring  of  software  system  behavior  in  operation.  In  this  paper,  we 
present  briefly  these  approaches  and  we  propose  a  new  approach  which 
aims  at  generating  dynamically  a  diverse  set  of  lightweight  traces.  We 
initiate  the  discussion  of  some  research  issues  which  will  be  the  focus  of 
our  future  research  work. 
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1  Introduction 

Security  remains  an  extremely  critical  issue.  This  is  evidenced  by  the  continuous 
growth  of  cyber  threats  [22].  Cyber-attacks  are  not  only  increasing  in  number 
but  also  in  sophistication  and  scale.  Some  attacks  are  now  of  nation/state  class 
[24] .  This  observation  can  be  explained  by  a  combination  of  a  multitude  of  con¬ 
tributing  factors,  which  include  the  followings. 

The  increasing  complexity  of  software  systems  makes  it  difficult  to  produce  fault 
free  software  even  though  different  quality  controls  are  often  part  of  the  software 
development  process.  These  residual  faults  constitute  dormant  vulnerabilities, 
which  would  eventually  end  up  being  discovered  by  determined  malicious  op¬ 
ponents  and  exploited  to  carry  out  cyber-attacks.  Moreover,  software  systems 
are  distributed  and  interconnected  through  open  networks  in  order  to  communi¬ 
cate  controls  and  data.  This  in  turns  increases  tremendously  the  risk  of  attacks. 
Most  importantly,  the  information  systems  are  running  significant  similar  soft¬ 
ware.  This  is  called  IT  monoculture  [17].  On  one  hand,  IT  monoculture  presents 
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Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
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several  advantages  including  easier  management,  less  configurations  errors  and 
support  for  inter-operability.  On  the  other  hand,  IT  monoculture  has  serious 
security  concerns  because  similar  systems  share  common  vulnerabilities,  and 
consequently,  facilitates  spread  of  viruses  and  malware. 

The  principle  of  diversity  can  be  used  to  mitigate  the  effects  of  IT  monocul¬ 
ture  on  software  system.  Diversity  has  been  used  to  complement  redundancy  in 
order  to  achieve  software  systems  reliability  and  fault  tolerance.  When  it  comes 
to  security,  the  approach  based  on  diversity  seeks  specifically  to  reduce  the  com¬ 
mon  vulnerabilities  between  redundant  components  of  a  system.  As  a  result, 
it  becomes  very  difficult  for  a  malicious  opponent  to  design  one  unique  attack 
that  is  able  to  exploit  different  vulnerabilities  in  the  system  components  simul¬ 
taneously.  Therefore,  the  resistance  of  the  system  to  cyber  attacks  is  increased. 
Moreover,  the  ability  to  build  a  system  out  of  redundant  and  diverse  compo¬ 
nents  provides  an  opportunity  to  monitor  the  system  by  comparing  the  dynamic 
behavior  of  the  diverse  components  when  presented  with  the  same  input.  This 
enables  to  endow  the  system  with  efficient  intrusion  detection  capability. 

In  this  paper  we  focus  on  how  diversity  can  be  used  to  generate  dynamically 
a  diverse  set  of  light  traces  for  the  same  behaviour  of  a  software  system.  To  this 
end,  we  define  a  setting  which  allows  running  in  parallel  several  instances  of  a 
process.  All  these  instances  are  provided  with  the  same  input.  Each  of  these  pro¬ 
cess  instances  runs  on  top  of  an  operating  system  kernel  which  is  instrumented 
differently  to  provide  traces  of  the  system  calls  pertaining  to  different  important 
functionalities  of  the  kernel.  We  raise  in  this  paper  some  research  question  that 
need  to  be  addressed. 

The  remaining  part  of  the  paper  is  organized  as  follows:  In  Section  2,  we 
introduce  the  main  idea  underlying  the  approaches  using  software  diversity  for 
security  purposes.  We  devote  Section  3  to  review  and  evaluate  the  state-of-the- 
art  approaches  based  on  software  diversity  to  mitigate  the  risk  associated  with 
the  IT  monoculture.  We  outline  and  discuss  in  Section  4  an  approach  which 
aims  at  enabling  the  dynamic  generation  of  a  diverse  set  of  lightweight  and 
complementary  traces  from  a  running  software  application.  We  conclude  the 
paper  in  Section  5. 


2  Diversity  as  a  Software  Security  Enabler 

Redundancy  is  a  traditional  means  to  achieve  fault  tolerance  and  higher  sys¬ 
tem  reliability.  This  has  proven  to  be  valid  mainly  for  hardware  because  of  the 
failure  independence  assumption  as  hardware  failures  are  typically  due  to  ran¬ 
dom  faults.  Therefore,  the  replication  of  components  provides  added  assurance. 
When  it  comes  to  software,  however,  failures  are  due  to  design  and/or  imple¬ 
mentation  faults.  As  a  result,  such  faults  are  embedded  within  the  software  and 
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their  manifestation  is  systematic.  Therefore,  redundancy  alone  is  not  effective 
against  software  faults. 

Faults  embedded  in  software  represent  potential  vulnerabilities,  which  can  be 
exploited  by  external  interactive  malicious  fault  (i.e.  attacks)  [2] .  These  attacks 
can  ultimately  enable  the  violation  of  the  system  security  property  (i.e.  security 
failure)  [2].  Therefore  the  diversity  principle  can  potentially  be  used  for  security 
purposes.  First,  diversity  can  be  used  to  decrease  the  common  vulnerabilities. 
This  is  achieved  by  building  a  software  system  out  of  a  set  of  diverse  but  function¬ 
ally  equivalent  components.  This  in  turns  makes  it  very  difficult  for  a  malicious 
opponent  to  be  able  to  break  into  a  system  with  the  very  same  attack.  Second, 
the  ability  to  build  a  system  out  of  redundant  and  diverse  components  provides 
an  opportunity  to  monitor  the  system  by  comparing  the  dynamic  behavior  of 
the  diverse  components  when  presented  with  the  same  input.  This  enables  to 
endow  the  system  with  efficient  intrusion  detection  capability. 

Therefore,  diversity  has  naturally  caught  the  attention  of  the  software  secu¬ 
rity  research  community.  The  seminal  work  presented  by  Forrest  et  al.  in  [11] 
promotes  the  general  philosophy  of  system  security  using  diversity.  The  authors 
argue  that  uniformity  represents  a  potential  weakness  because  any  flaw  or  vul¬ 
nerability  in  an  application  is  replicated  on  many  machines.  The  security  and 
the  robustness  of  a  system  can  be  enhanced  through  the  deliberate  introduction 
of  diversity.  Deswarte  et  al.  review  in  [9]  the  different  levels  of  diversity  of  soft¬ 
ware  and  hardware  systems  and  distinguish  different  dimensions  and  different 
degrees  of  diversity.  Bain  et  al.  [3]  presented  a  study  to  understand  the  effects  of 
diversity  on  the  survivability  of  systems  faced  with  a  set  of  widespread  computer 
attacks  including  the  Morris  worm,  Melissa  virus,  and  LoveLetter  worm.  Taylor 
and  Alves-Foss  report  in  [23]  on  a  discussion  held  by  a  panel  of  renowned  re¬ 
searchers  about  the  use  of  diversity  as  a  strategy  for  computer  security  and  the 
main  open  issues  requiring  further  research.  It  emerges  from  this  discussion  that 
there  is  a  lack  of  quantitative  information  on  the  cost  associated  with  diversity- 
based  solutions  and  a  lack  of  knowledge  about  the  extent  of  protection  provided 
by  diversity. 


3  Diversity-based  approaches  to  Software  Security 

We  have  undertaken  a  comprehensive  study  to  evaluate  the  state-of-the-art  ap¬ 
proaches  based  on  the  principle  of  software  diversity  to  mitigate  the  risk  of  IT 
monoculture  and  enable  software  security  [14].  These  approaches  can  be  classi¬ 
fied  into  the  three  main  following  categories. 


3.1  System  Integration  and  Middlware 

This  category  include  proposals  of  software  architectures  which  deploy  redun¬ 
dancy  combined  with  software  diversity  either  by  using  integrating  multiple 
Commercial-Off-The-Shelf  (COTs)  applications  coordinated  through  a  proxy 
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component  or  by  defining  and  using  a  middleware  to  achieve  the  same  pur¬ 
pose. 

The  software  architectures  described  in  this  section  implement  the  architectural 
pattern  depicted  in  Figure  1.  This  approach  is  ideal  for  a  system  integration  of 
COTS  components  or  legacy  and  closed  applications  aiming  to  deliver  the  ser¬ 
vices.  The  servers  are  shielded  from  the  user  side  through  proxies.  Monitoring 
and  voting  mechanisms  are  used  to  check  the  health  of  the  system,  validate  the 
results,  and  detect  abnormal  behavior.  Examples  of  this  approach  include  the 
Dependable  Intrusion  Tolerance  (DIT)  architecture  [10]  [26],  the  Scalable  Intru¬ 
sion  Tolerant  Architecture  (SITAR)  [28],  and  Hierarchical  Adaptive  Control  for 
QoS  Intrusion  Tolerance  (HACQIT)  [19]. 


Fig.  1.  General  Pattern  of  Intrusion  Tolerance  Architecture 


Middleware-based  approaches  are  much  richer  since  they  can  provide  server  co¬ 
ordination  between  multiple  "diverse”  applications  while  hiding  the  sub-system 
differences  [20] .  Several  intrusion  tolerant  software  architectures  are  part  of  this 
category.  The  Intrusion  Tolerance  by  Unpredictable  Adaptation  (ITUA)  archi¬ 
tecture  is  a  distributed  object  framework  which  integrates  several  mechanisms 
to  enable  the  defense  of  critical  applications  [18].  The  objective  of  this  archi¬ 
tecture  is  to  enable  the  tolerance  of  sophisticated  attacks  aiming  at  corrupting 
a  system.  Malicious  and  Accidental  Fault  Tolerance  for  Internet  Applications 
(MAFTIA)  [27]  is  a  European  research  project  which  targeted  the  objective  of 
systematically  investigating  the  tolerance  paradigm  in  order  to  build  large  scale 
dependable  distributed  applications.  The  Designing  Protection  and  Adaptation 
into  a  Survivability  Architecture  (DPASA)  [1]  [7]  is  a  survivability  architec- 
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ture  providing  a  diverse  set  of  defense  mechanisms.  In  this  architecture  diver¬ 
sity  is  used  to  achieve  a  defense  in  depth  and  a  multi-layer  security  approach 
[7].  This  architecture  relies  on  a  robust  network  infrastructure  which  supports 
redundancy  and  provides  security  services  such  as  packet  filtering,  source  au¬ 
thentication,  link-level  encryption,  and  network  anomaly  sensors.  The  detection 
of  violations  "triggers”  defensive  responses  provided  by  middleware  components 
in  the  architecture.  Fault /instrusiOn  REmoVal  through  Evolution  and  Recovery 
(FOREVER)  [5]  is  a  service  which  is  used  to  enhance  the  resilience  of  intrusion- 
tolerant  replicated  systems.  FOREVER  achieves  this  goal  through  the  combi¬ 
nation  of  recovery  and  evolution.  FOREVER  allows  a  system  to  recover  from 
malicious  attacks  or  faults  using  time-triggered  or  event-triggered  periodic  re¬ 
coveries. 

3.2  Software  Diversity  through  Automated  Program 
Transformat  ions 

Diversity  can  be  introduced  in  the  software  ecosystem  by  applying  automatic 
program  transformations,  which  preserve  the  functional  behavior  and  the  pro¬ 
gramming  language  semantics.  They  consist  essentially  in  randomization  of  the 
code,  the  address  space  layout  or  both  in  order  to  provide  a  probabilistic  defense 
against  unknown  threats.  Three  main  techniques  can  be  used  to  randomize  soft¬ 
ware. 

The  Instruction  Set  Randomization  (ISR)  technique  [4]  [16]  changes  the  instruc¬ 
tion  set  of  the  processor  so  that  unauthorized  code  will  not  run  successfully. 
The  main  idea  underlying  ISR  is  to  decrease  the  attacker’s  knowledge  about  the 
language  used  by  the  runtime  environment  on  which  the  target  application  runs. 
ISR  techniques  aim  at  defending  against  code  injection  attacks,  which  consist 
in  introducing  executable  code  within  the  address  space  of  a  target  process,  and 
then  passing  the  control  to  the  injected  code.  Code  injection  attacks  can  succeed 
when  the  injected  code  is  compatible  with  the  execution  environment. 

Address  Space  Randomization  (ASR)  [21]  is  used  to  increase  software  resistance 
to  memory  corruption  attacks.  These  are  designed  to  exploit  memory  manipu¬ 
lation  vulnerabilities  such  as  stack  and  heap  overflows  and  underflows,  format 
string  vulnerabilities,  array  index  overflows,  and  uninitialized  variables.  ASR 
consists  basically  in  randomizing  the  different  regions  of  the  process  address 
space  such  as  the  stack  and  the  heap.  It  is  worth  noticing  that  ASR  has  been 
integrated  into  the  default  configuration  of  the  Windows  Vista  operating  system 

[30]. 

Data  Space  Randomization  (DSR)  is  a  different  randomization-based  approach 
which  aims  also  at  defending  against  memory  error  exploits  [6].  In  particular, 
DSR  randomizes  the  representation  of  data  objects.  This  is  often  implemented 
by  applying  a  modification  to  the  data  representation,  such  as  using  an  XOR 
operation  for  each  data  object  in  memory  against  randomly  chosen  mask  values. 
The  data  are  unmasked  right  before  being  used.  This  makes  the  results  of  using 
the  corrupted  data  highly  unpredictable.  The  DSR  technique  seems  to  have  ad¬ 
vantages  over  ASR,  as  it  provides  a  broader  range  of  randomization:  on  32-bit 
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architectures,  integers  and  pointers  are  randomized  over  a  range  of  232  values. 
In  addition,  DSR  is  able  to  randomize  the  relative  distance  between  two  data 
objects,  addressing  a  weakness  of  the  ASR  technique. 


3.3  Dynamic  Behavior  Monitoring 


The  ability  to  build  a  system  combining  redundant  and  diverse  components  pro¬ 
vides  new  powerful  capabilities  in  terms  of  advanced  monitoring  of  the  redundant 
system  by  comparing  the  behavior  of  the  diverse  replicas.  This  endows  the  sys¬ 
tem  with  efficient  intrusion  detection  capabilities  not  achievable  with  standard 
intrusion  detection  techniques  based  on  signatures  or  malware  modeling.  More¬ 
over,  with  the  introduction  of  some  assessment  of  the  behavioral  advantages  of 
one  implementation  over  the  others,  a  ”  meta-controller”  can  ultimately  adapt 
the  system  behavior  or  its  structure  over  time.  Several  experimental  systems  used 
output  voting  for  the  sake  of  detecting  some  types  of  server  compromising.  For 
example,  the  HACQIT  system  [19]  uses  the  status  codes  of  the  server  replica  re¬ 
sponses.  If  the  status  codes  are  different  the  system  detects  a  failure.  Totel  et  al. 
[25]  extend  this  work  to  do  a  more  detailed  comparison  of  the  replica  responses. 
They  realized  that  web  server  responses  may  be  slightly  different  even  when 
there  is  no  attack,  and  proposed  a  detection  algorithm  to  detect  intrusions  with 
a  higher  accuracy  (lower  false  alarm  rate).  These  research  initiatives  specifically 
target  web  servers  and  analyze  only  server  responses.  Consequently,  they  cannot 
consistently  detect  compromised  replicas.  N-variant  systems  provide  a  frame¬ 
work  which  allows  executing  a  set  of  automatically  diversified  variants  using  the 
same  input  [8] .  The  framework  monitors  the  behavior  of  the  variants  in  order  to 
detect  divergences.  The  variants  are  built  so  that  an  anticipated  type  of  exploit 
can  succeed  on  only  one  variant.  Therefore,  such  exploits  become  detectable. 
Building  the  variants  requires  a  special  compiler  or  a  binary  rewriter.  More¬ 
over,  this  framework  detects  only  anticipated  types  of  exploits,  against  which 
the  replicas  are  diversified.  Multi-variant  code  execution  is  a  runtime  monitor¬ 
ing  technique  which  prevents  malicious  code  execution  [29] .  This  technique  uses 
diversity  to  protect  against  malicious  code  injection  attacks.  This  is  achieved 
by  running  several  slightly  different  variants  of  the  same  program  in  lockstep. 
The  behavior  of  the  variants  is  compared  at  synchronization  points,  which  are 
in  general  system  calls.  Any  divergence  in  behavior  is  suggestive  of  an  anomaly 
and  raises  an  alarm.  The  behavioral  distance  approach  aims  at  detecting  sophis¬ 
ticated  attacks  which  manage  to  emulate  the  original  system  behavior  including 
returning  the  correct  service  response  (also  known  as  mimicry  attacks).  These 
attacks  are  thus  able  to  defeat  traditional  anomaly-based  intrusion  detection 
systems  (IDS).  Behavioral  Distance  achieves  this  defense  using  a  comparison  be¬ 
tween  the  behaviors  of  two  diverse  processes  running  the  same  input.  It  measures 
the  extent  to  which  the  two  processes  behave  differently.  Gao  et  al.  proposed  two 
approaches  to  compute  such  measures  [12]  [13]. 
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4  Towards  a  Diversity-based  Approach  for  Dynamic 
Generation  of  Lightweight  Traces 

The  comprehensive  dynamic  monitoring  of  an  operating  system  kernel  such  as 
Linux  kernel  is  a  daunting  and  challenging  task.  Indeed,  it  yields  massive  traces 
which  are  very  difficult  to  be  dealt  with  and  in  particular  to  be  abstracted 
correctly  to  reach  systematically  meaningful  information  [15].  The  principle  of 
diversity  can  be  potentially  leveraged  to  address  this  issue.  The  main  idea  is 
to  deploy  a  set  of  redundant  Linux  nodes  running  in  parallel.  This  set  can  also 
includes  deliberately  a  subset  of  replicas  that  are  purposefully  vulnerable.  The 
diversity  is  introduced  by  the  fact  that  the  replicas  are  monitored  differently.  In¬ 
deed,  the  focus  on  each  Linux  kernel  replica  is  put  on  different  (predetermined) 
perspectives.  These  include  the  main  kernel  services  such  as  memory  manage¬ 
ment,  file  system  management,  networking  sockets,  interrupts,  etc. 


Input 


Fig.  2.  Architecture  for  Diversity-based  Dynamic  Monitoring 


The  general  software  system  architecture  outlined  Figure  2  aims  at  enabling 
the  dynamic  generation  of  a  diverse  set  of  traces  of  the  behavior  of  a  software 
application.  Each  of  these  traces  is  a  sub-trace  of  the  whole  trace  of  the  software 
application  and  it  reflects  a  particular  functionality  of  the  operating  system  ker¬ 
nel.  For  an  software  application  deployed  in  this  setting  N  processes  are  spawn 
and  run  in  parallel.  Each  of  these  processes  runs  in  the  environment  of  an  op¬ 
erating  system  where  the  kernel  has  been  instrumented  to  provide  the  trace  of 
a  particular  functionality  such  as  the  memory  management  functionality,  file 
management  functionality,  networking  management,  input /output  drivers  etc. 
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All  the  instances  of  the  application  running  are  provided  with  the  same  input 
which  might  be  a  malicious  input  (i.e.  an  attack).  The  generated  traces  are  col¬ 
lected  by  a  monitoring  entity  which  is  in  charge  of  analyzing  and  correlating 
them  using  techniques  that  need  to  be  investigated  as  discussed  in  the  following 
section. 

This  dynamic  monitoring  configuration  would  yield  a  diverse  set  of  much 
more  lightweight  traces.  The  latter  are  are  sub-traces  of  the  comprehensive  trace 
of  the  running  application.  We  are  interested  in  investigating  several  research 
questions  which  can  be  raised  using  this  monitoring  setting.  These  include  the 
identification  of  correlations  between  the  different  traces  both  in  normal  (i.e. 
healthy  and  secure  system)  and  abnormal  situations  (system  under  attack  or 
intrusion)  as  well  as  the  identification  of  malicious  behavior  patterns. 

5  Conclusion 

Software  systems  security  is  a  critical  issue.  An  important  contributing  factor  to 
this  issue  is  the  significant  similarity  in  the  software  used  in  such  systems.  This 
is  called  IT  mono-culture.  The  mitigation  of  this  issue  consists  in  using  diversity 
which  aims  at  reducing  the  common  vulnerabilities  and  consequently  increasing 
the  difficulty  of  breaking  systems  built  with  diversity  in  mind. 

In  this  article  we  focus  on  how  diversity  can  be  deployed  to  enable  software  be¬ 
haviour  dynamic  monitoring  to  the  end  of  intrusion  detection.  We  have  presented 
a  diversity  based  approach  which  aims  at  generating  dynamically  traces  per¬ 
taining  to  different  functionalities  of  the  operating  system  kernel.  These  traces, 
which  are  the  sub-traces  of  the  comprehensive  trace  of  the  software  applica¬ 
tion  behaviour,  are  therefore  smaller.  We  are  interested  to  investigate  the  dif¬ 
ferent  correlations  and  patterns  that  we  can  discover  between  these  sub-traces 
in  situation  where  the  software  application  is  healthy  and  secure  and  when  it  is 
compromised. 
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